-
-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Yielding Runner #41
Yielding Runner #41
Conversation
This is a candidate implementation that allows class transforms to yield rows.
Kiba 2 requires Ruby 2.0 + for Enumerator::Lazy
In 82bc8e5 I add a simple benchmark to compare runners. This is a very simplistic benchmark which does not reflect a real-life use, where I wanted to get a feel of how costly using
I will carry out more tests and will try to improve the alternate runner. |
I've been using this on multiple internal reporting systems and things work just great. Will come back to refactor and clean-up. Ideally if I can figure out a way to selectively enable the (more costly) lazy enumerator for specific transforms, it would be great (instead of fully switching runner class). |
Extract runner tests to allow upcoming reuse in #41
This cherry-picks from #41 but improves the syntax a bit.
This cherry-picks from #41 but with a more DRY code.
A much better choice than what was originally implemented in #41, since: - it allows to decide which runner to pick on a per-ETL basis - it will work inside sidekiq (vs only on command line)
Closing in favour of #44 (which is still a WIP but improved already). |
More polished version is now available on |
In this PR, a work-in-progress "alternate runner", which is actually a candidate runner for Kiba 2.
Then new candidate runner brings 2 massive benefits.
Ability to yield multiple rows from a given class transform
The new runner allows "class transforms" (transforms written as classes, rather than blocks) to yield an arbitrary number of rows.
It allows the source presented in this blog post to be rewritten as a transform like this:
More importantly, it allows to stack up such transforms, each yielding sub-rows.
Increased ability to write reusable Kiba components
Let's pick an example 😄
Imagine you have created a Kiba source able to extract XML elements from a group of files on disk (with each file containing N elements).
It would typically look like this:
Such a class has 4 responsabilities:
With Kiba 1 you can achieve some level of splitting here by using the decorator technique I outlined here, but this can only take you so far.
With the new Kiba runner, you can first rewrite the code above as 4 independent components (1 source & 3 transforms):
Which can then be used:
While it can appear to be more complicated at first, each of these 4 components can now be mix-and-matched with other components in completely unrelated scenarios.
For instance:
DirectoryLister
could be used to list anything (JSON files etc).EnumerableExploder
, similarly, could be used for pretty much anything.This opens the door to provide more composable & more reusable components to Kiba users, or as part of Kiba Common or Kiba Pro.
Notes on current implementation
The new implementation relies on nested
Enumerator::Lazy
instances.I must still benchmark this behaviour in terms of performance compared to Kiba 1, and also improve the code a bit more, before being able to decide if this will remain in Kiba 2 or not.